Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🤖 AI Inference
Model Serving, Inference Optimization, ONNX, Model Deployment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
10211
posts in
223.0
ms
Building
Production-Grade
Agentic AI Systems: A
Comprehensive
Guide
dev.to
·
22h
·
Discuss:
DEV
🏗️
AI Infrastructure
AI
Workflows
with
human-in-the-loop
weavemind.ai
·
22m
·
Discuss:
Hacker News
🤖
AI Coding Tools
MentorCollab
:
Selective
Large-to-Small Inference-Time Guidance for Efficient Reasoning
arxiv.org
·
2d
🔗
CoT Prompting
LAI
#113: The Engineering Work That
Decides
Whether AI Holds Up
pub.towardsai.net
·
2d
🏗️
AI Infrastructure
Zero-Latency
Local AI:
Tuning
Your Linux Kernel for LLM Inference 🐧🧠
dev.to
·
22h
·
Discuss:
DEV
🏗️
AI Infrastructure
Build an AI RAG
Chatbot
with
n8n
, Google Drive & Gemini
lumberjack.so
·
56m
🤖
AI Coding Tools
Building the Future with AI That
Acts
devxt.com
·
9h
·
Discuss:
Hacker News
🧠
AI
The Missing
Layer
Above AI Inference
Governance
vibe.forem.com
·
4h
·
Discuss:
DEV
🤖
Anthropic Claude
Sequential Attention: Making AI models
leaner
and faster without
sacrificing
accuracy
research.google
·
3d
·
Discuss:
Hacker News
,
r/LocalLLaMA
🏗️
AI Infrastructure
AI Inference
Pipelines
– Building Low-Latency Systems With
gRPC
youtube.com
·
3d
🏗️
AI Infrastructure
Understanding LLM Inference
Engines
: Inside
Nano-vLLM
(Part 2)
neutree.ai
·
1d
·
Discuss:
Hacker News
💻
Local LLMs
The
Laziest
Conspiracy
in AI:
thecynicalnerd.bearblog.dev
·
12h
🖥
computers
How we cut
Vertex
AI latency by 35% with
GKE
Inference Gateway
cloud.google.com
·
1d
🏗️
AI Infrastructure
Seedance2
– multi-shot AI video generation
genstory.app
·
16h
·
Discuss:
Hacker News
🎲
Procedural Generation
Show HN:
HypothesisHub
– An open API where AI agents
collaborate
on medical res
medresearch-ai.org
·
20h
·
Discuss:
Hacker News
🏗️
AI Infrastructure
The control
layer
for AI
blog.dottxt.ai
·
1d
·
Discuss:
Hacker News
🏗️
AI Infrastructure
How I
squeezed
a
BERT
sentiment analyzer into 1GB RAM on a $5 VPS
mohammedeabdelaziz.github.io
·
17h
·
Discuss:
Hacker News
📱
Edge AI
First
Proof
| Research-Level
Math
for AI Evaluation
1stproof.org
·
1d
·
Discuss:
Hacker News
🧩
Constraint Programming
Matching
the right LLM for your GPU feels like an art, but I finally
cracked
it
xda-developers.com
·
7h
⚡
Hardware Acceleration
Quantization-Aware
Distillation
ternarysearch.blogspot.com
·
6h
·
Discuss:
Hacker News
💻
Local LLMs
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help